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Abstract 

The goal for the Directed Steiner Tree problem is to find a minimum cost tree in 
a directed graph G - {V,E) that connects all terminals X to a given root r. It is well 
I known that modulo a logarithmic factor it suffices to consider acyclic graphs where the 

^) ■ nodes are arranged in ^ < log|X| levels. Unfortunately the natural LP formulation has 

I a 0{\/|X|) integrality gap already for 5 levels. We show that for every the 0{(!) -round 

O ■ Lasserre Strengthening of this LP has integrality gap 0(^log|X|). This provides a poly- 

nomial time I Xl*^ -approximation and a 0(log^|X|) approximation in 0(n'°s'^') time, 
■ matching the best known approximation guarantee obtained by a greedy algorithm of 

" Charikar et al. 

m" 

! 1 Introduction 

Most optimization problems that appear in combinatorial optimization can be written as 
an integer linear problem, say in the form min{c^x | Ax > b; x e {0, 1}"}, where the system 
Ax > b represents the problem structure. Since c is linear, this is optimizing over the convex 
hull Kj := conv{K n {0, 1}"), where K:= {xeU" \ Ax > b} denotes a polyhedron. 
^ '. Such optimization problems are NP-hard in general, thus a standard approach for ob- 

^ \ taining approximate solutions is to optimize instead over the relaxation K and then try to 

extract a close integral solution. This approach yields in many cases solutions whose quality 
matches the lower bound provided by the PCP Theorem or the Unique Games Conjecture 
(which is the case e.g. for Set Cover ILov75llFel98l . Vertex Cover [ KR 081 and Facility Lo- 
cation IGK991lLim . However, there is a significant number of problems, where the integral- 
ity gap between K and Kj appears to be far higher than the approximability of the problem, 
so that a stronger formulation is needed. 

At least in the field of approximation algorithm, researchers have so far mostly preferred 
problem-specific inequalities to lower the integrality gap (a nice example is the 0(1) -apx for 
MiN-SuM Set Cover |BGK10|). However there are very general techniques that can be used 
to strengthen the convex relaxation K. 



* Supported by the Alexander von Humboldt Foundation within the Feodor Lynen program, by ONR grant 
N00014-II-1-0053 andbyNSF contract CCF-0829878. 
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Especially in the field of (computational) integer programming, the approach of cutting 
planes is very popular. In the Gomory-Chvdtal Closure CG{K) Q K one adds simultaneously 
cuts a^x < L/3J for all valid inequalities a^x < ^ with a e Z". On the positive side, after at 
most O(n^logn) iterative applications of the closure operation, one reaches Kj IES99I (as- 
suming that K Q [0, 1]"). But the drawback is that already optimizing over the first closure 
CG{K) is coNP-hard |Eis991. Singh and Talwar |ST10| studied the effect of Gomory-Chvatal 
cuts to the integrality gap of hypergraph matchings and other problems . 

However, more promising for the sake of approximation algorithms are probably LP/SDP 
hierarchies like the ones of Balas, Ceria, Cornuejols IBCC93 1 ; Lovdsz, Schrijuer |LS911 (with 
LP-strengthening LS and an SDP- strengthening LS+); Sherali, Adams \ SA90 1 or Lasserre \ L asOlal 
iLasOlbl . On the f-th level, they all use n'-"-'^ additional variables to strengthen K (thus the 
term Lift-and-Project Methods) and they all can be solved in time n^'-^K Moreover, for t = n 
they define the integral hull Kj and for any set of | S| < t variables, a solution x can be written 
as convex combinations of vectors from K that are integral on S. Despite these similarities, 
the Lasserre SDP relaxation is strictly stronger than all the others. We refer to the survey of 
Laurent ILau03al for a detailed comparison. 

Up to now, there have been few (positive) result on the use of hierarchies in approxima- 
tion algorithms. One successful application of Chlamtac |Chl07| uses the 3rd level of the 
Lasserre relaxation to find 0(n''^°'^^) -colorings for 3-colorable graphs. It lies in the range 
of possibilities that O(logn) levels of Lasserre might be enough to obtain a coloring with 
O(logn) colors in 3-colorable graphs | ACC06 1 . In fact, for special graph classes, there has 
been recent progress by Arora and Ge |AG11|. Chlamtac and Singh |CS08| showed that 
0(1/7^) rounds of a mixed hierarchy can be used to obtain an independent set of size n^'-^''^ ' 
in a 3-uniform hypergraph, whenever it has an independent set of size jn. After a con- 
stant number of rounds of Sherali-Adams, the integrality gap for the matching polytope 
reduces to 1 + e |MS09|. The same is true for MaxCut in dense graphs (i.e. graphs with 
Q(n^) edges) |dlVKM07|. The Sherali-Adams hierarchy is also used in IBCG09J to find degree 
lower-bounded arborescences. 

Guruswami and Sinop provide approximation algorithms for quadratic integer program- 
ming problems whose performance guarantees depend on the eigenvalues of the graph Lapla- 
cian IGSlll . Also the Lasserre-based approach of |BRS11| for Unique Games depends on 
the eigenvalues of the underling graph adjacency matrix. Though the 0[\/log n)-apx of 
Arora, Rao and Vazirani IARV04I for Sparsest Cut does not explicitly use hierarchies, their 
triangle inequality is implied by 0(1) rounds of Lasserre. For a more detailed overview on 
the use of hierarchies in approximation algorithms, see the recent survey of Chlamtac and 
Tulsiani (CTTTI . 

Moreover, integrality gap lower bounds exist for various problems fLau03b[ |AAT051ITou061 
ISTT07a[ lSTT07b[ IGMPT071 iSchOSi ICMM09I lTul09l . To name only few of these results, even 
a linear number of Lasserre rounds cannot refute unsatisfiable constraint satisfaction prob- 
lems ISchOSi and the gap for Graph Coloring is still k versus 2*^**^' lTul09l . In contrast, 
the LP-based hierarchies LS and SA cannot even reduce the MaxCut gap below 2 - e af- 
ter 0(n) (STTOTbl and |CMM09| many rounds, respectively. Recall that already a single 
round of the SDP based hierarchies reduces the gap to 1.13. 

In this paper, we apply the Lasserre relaxation to the flow-based linear programming 
relaxation of Directed Steiner Tree. The input for this problem consists of a directed 
graph G = {V,E) with edge cost c : £ — ► a root r eV and terminals XqV. The goal is to 
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Figure 1: Illustration of layered graph G with ^ = 3. Edges are labelled with their cost. Black 
edges denote an optimum Directed Steiner Tree solution. Rectangles depict terminals. 



compute a subset T qE of edges such that there is an r-s path in T for each terminal 5. Note 
that the cheapest such set always forms a tree (see Figure[T)- 

By a straightforward reduction from Set Cover one easily sees that the problem is f2(log n) - 
hard IFei98l to approximate. Zelikovsky IZel97l obtained a IXI'^-approximation for every 
constant f > using a greedy approach. In fact, he provided the following useful insight: 

Theorem 1 f lZel97[[CZ05n . For every £ > 1, there is a tree T (potentially using edges in the 
metric closure) ofcostc{T) < £-\X\^'^ -OPT such that every r-s path (with se X) in T contains 
at most £ edges. 

In other words, at the cost of a factoiQ £ ■ \X\^'^ in the approximation guarantee, one may 
assume that the graph is acyclic and the nodes are arranged in £ level^ (observe that for £ = 
\og\X\, one has = ^ . (21081^1)1/1081^1 = 0(log|X|)). Later Charikar, Chekuri, Cheung, 

Goel, Guha and Li ICCC^99l l gave a 0(log^ \X\) approximation in time again using 

a sophisticated greedy algorithm. So far, methods based on linear programming were less 
successful. In fact, Zosin and KhuUer I.ZK02 1 show that the natural flow based LP relaxation 
has an integrality gap of n(vTxT). The importance of Directed Steiner Tree lies in the 
fact that it generalizes a huge number of problems, e.g. Set Cover, (non-metric, multilevel) 
Facility Location and Group Steiner Tree. For the latter problem, the input consists 
of an undirected (weighted) graph G = [V, E), groups Gi, . . . , Gfc E V oi terminals and a root 
r £V. The goal here is to find a tree T Q E oi minimum cost c{T) that connects at least one 
terminal from every group to the root. The state of the art for Group Steiner Tree is still 
the elegant approach of Garg, Konjevod and Ravi iGKROO] . Using 0(log n)-average distortion 
tree embeddings, one can assume that the input graph itself is a tree. Then I GKROO | solve a 
flow based LP and provide a rounding scheme, which gives a 0(log^ n) -approximation w.r.t. 
the optimum solution in the tree graph. Surprisingly, |HKK^03| found a tree instance, which 
indeed has an integrality gap of Q(log^n). Later Halperin and Krauthgamer |HK031 even 
proved a r2(log^~'^ n) inapproximability for Group Steiner Tree (on tree graphs) and in turn 
also for Directed Steiner Tree. 

1 The claim in IZeI97l was initially |X|1'^, which is incorrect, though it was later on heavily used in the lit- 
erature. In a later paper, Ca linescu and Zelikovsky CZ05 1 change the claim to f • |X| 1^'^ . As a consequence, the 
Odog^lXD-apxin |CCC+99| has to be changed to a OQog^ |X|)-apx. 

^This can easily be achieved by taking £ + 1 copies of the node set in the original graph and insert cost-0 edges 
between copies of the same node. 



3 



Our contribution 

Many researchers have failed in designing stronger LP relaxations for Directed Steiner 
Tree (seeAlon, Moitraand Sudakov |AMS12| for a counterexample to a promising approach). 
We make partial progress by showing that in an ^-level graph, already 0{£) rounds of the 
Lasserre hierarchy drastically reduce the integrality gap of the natural flow-based LP for Di- 
rected Steiner Tree from QCvTxT) (for £>5) down to 0(^log|X|). This gives an alternative 
polylogarithmic approximation in quasi-polynomial time (and the first one that is based on 
convex relaxations) . 

In this paper, we try to promote the application of hierarchies in approximation algo- 
rithms. For this sake, we demonstrate how the Lasserre relaxation can be used as a black box 
in order to obtain powerful (yet reasonably simple) approximation algorithms. 

From a technical view point we adapt the rounding scheme of Garg, Konjevod and Ravi IGKROOl . 
Though their algorithm and analysis crucially relies on the fact that the input graph itself 
is a tree, it turns out that one can use instead the values of the auxiliary variables in the 
O(^) -round SDP to perform the rounding. Another ingredient for our analysis is the recent 
Decomposition Theorem of Karlin, Mathieu and Nguyen [KMNll l for the Lasserre hierarchy. 

2 The Lasserre Hierarchy 

In this section, we provide a definition of the Lasserre hierarchy and all properties that are 
necessary for our purpose. In our notation, we mainly follow the survey of Laurent I Lau03ai . 
Let ([«]) := {/ c I |/| < t} be the set of all index sets of cardinality at most t and let y £ 
lfgS»2(+2([«]) \jQ a vector with entries yi for all / e [n] with |/| < 2f + 2. Intuitively y^] represents 
the original variable x; and the new variables yi represent HiEiXi. We define the moment 
matrix Mt+iiy) e by 

(Mf+i(y))7j:=y/u/ V|/|, |/| < f + 1. 

For a linear constraint a^x > /3 with a e IR" and /3 e IR we define [^) * y as the vector z with 

zi:='Li<,[„] aiyiu{i]-PyM 

Definition 1. Let K = {x eU" \ Ax > b}. We define the f-th level of the Lasserre hierarchy 

LASt(^ as the set of vectors y eU^'"^'^''^"^'' that satisfy 

Mt+i{y)>Q; Mt([^;)*y)>0 Vfe[m]; y0 = l. 

Furthermore, letLkS^/"^ := {(y{i}, . . . , y{n]) I y £ LaSj (i<r)} be the projection on the original vari- 
ables. 

Intuitively, the PSD -constraint Mf ([^') * y) > guarantees that y satisfies the (-\h linear 
constraint, while Mt+iiy) > takes care that the variables are consistent (e.g. it guarantees 
thaty{i,2} e [y{i} + y{2}- l,min{y{i},y{2}}]). The Lasserre hierarchy can even be applied to non- 
convex semi-algebraic sets - but for the sake of a simple presentation we stick to polytopes. 

^This notation was initially introduced for multivariate degree-one polynomials g(x) = T,l glYliei xi which 
induce constraints of the form g{x) > 0. In this general case, one defines (g * y)j := Y.Ksln] SK ' VluK- Note that 
for a linear constraint x > )3, one has g{,} = a,-,g0 = -/3 and gj = for |/| > I. We stick to this notation to be 
consistent with the existing literature. 
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Fortunately, one can use the Lasserre relaxation conveniently as a black-box. We list all prop- 
erties, that we need for our approximation algorithm: 



Theorem 2. Let K= {xeU" \ Ax > b} and y e Las j {K) . Then the following holds: 

(a) conviK n {0, 1}") = LAS^™^(iq e LAs''J°{iK) LAS^''°-'(iC) £ K. 

(b) OnehasO< yi<yj< I for all I ^ J withO< \ J\ < |/| < r. 

(c) LetlQ [n] with \I\ < t. ThenKr\{x e | x,- = 1 Vz e /} = ^ y/ = 0. 

(d) LetlQln] with\I\<t. Then y e conv{{z eLASt-\i\[K) \ z^} e {0,1} \/i e I})^ 

(e) Let S £ [n] be a subset of variables such f^af max{|/| : / £ S;x e ^;x,- = 1 V/ e 7} < A: < f. 
Then y e conv[{z e LASf_fc(^:) | Z{i] e {0, 1} V / e S}). 

(f) Forany\I\ < t one has yi = 1 o Aieiiyn] = !)• 

(g) For\I\ < t: {\/i e / : yj,-} e {0, 1}) ^ yi = Ui^iy{i]. 

(h) Let\I\,\J\ < t andyi = I. Thenyj^j = yj. 

Proof. Proofs of lEJ,©,© can be found in Laurent fLau03al. © is the Decomposition The- 
orem of I.KMN11I . and l[g| follow easily from 10 and ©. For 0, consider the principal 
submatrix 

M= 1 1 yiuj\ 

<yj yiuj yj ) 

of Mf+i(y) that is induced by indices {0,1, J} (substituting yi with 1). Then det(M) = -(yz- 
y/u/)^ > implies that y/ = y/y/. □ 

Though all these properties are well known, to be fully self contained, we provide a com- 
plete introduction with proofs of the non-trivial statements HI,©,© in the appendix. 

Especially © is a remarkably strong property that does not hold for the Sherali-Adams or 
Lovasz-Schrijver hierarchy (see IKMNll |). For example, it implies that after t = 0{^) rounds, 
the integrality gap for the Knapsack polytope is bounded by 1 + £ (taking S as all items that 
have profit at least e ■ OPT). The same bound holds for the Matching polytope {x e IR^ | 
xi5{v)) <iyveV} (since Property l|e) implies all Blossom inequalities up to 2?+ 1 nodes). 
Another immediate consequence is that the Independent Set polytope {xeU^ \ Xu + Xi,< 
1 V{m, v} e E} describes the integral hull after a(G) rounds of Lasserre (where a(G) is the 
stable set number of the considered graph) . 

3 The linear program 

The natural LP formulation for Directed Steiner Tree sends a unit flow from the root to 
each terminal s e X (represented by variables fs^e)- The amount of capacity that has to be 
paid on edge e is yg = max{/s_e I 5 £ X}. We abbreviate S'^'iv) := {{v, u) \ [v, u) e E} i5~{v) := 



^Formally spoken, vectors in LaS(_|/| have less dimensions than y. Thus it would be more correct to write 
•^l^2((-i/i)+2([nl) ^ LaS(_|/| (K) where Z|g*2t+2-2i7i (["D denotes the restriction of z to all entries / with |/| < 2(f-|/|)+2. 
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{{u, v] I [u, v] e E}, resp.) as the edges outgoing (ingoing, resp.) from v and y[E') := T-eeE' Ye- 
The LP is 



min ^ Ceye 

eeE 



1 



v = r 



fs,e ^ fs,e 







-1 = s 



otherwise 



yseXyveV 



fs,e < ye VseXVeeE 

y{5~{v)) < 1 \/veV 

0<ye < 1 VeeE 

0</s,e < 1 VseXVeeE 



Note that we have an additional constraint y{6~{v)) < 1 (i.e. only one ingoing edge for 
each node) which is going to help us in the analysis. Let K qU^ x U^""^ be the set of frac- 
tional solutions. This LP has an integrality gapH of £li\/\X\) even if the number of layers is 
5 I.ZK02 1 ■ From now on, we make the choice t := 2£, i.e. we consider the 2^-round Lasserre 
strengthening of the above LP.LetVt = {{P,H) \ P Q E;H Q X x E;\P\ + \H\ < 2t + 2} be the 
set of variable indices for the t-th level of Lasserre. In other words LASr{^) £ [0, l]^'. Let 
Y = iYp_H){p,H]£r, £ Las f (if) be an optimum solution for the Lasserre relaxation, which can 
be computed in time n'^'-^\ We abbreviate OPTf := T.eeE<^ey{e] as the objective function 
value. 

We will only address either groups of variables (then we write yn '■= Yh,0 for H Q E), 
or we address groups of g variables for the same terminal s e X. Then we write fs,H '■= 

Y0,{(s,e)\eeH}- 

4 The rounding algorithm 

By Theorem[T] we may assume that the node set is partitioned into levels Vq = {r }, Vj , . . . , i , = 
X and all edges are running between consecutive layers (i.e. E Q Uy^^C^;-! ^ ^/))- See Fig- 
ure[I]for an illustration. In the following, we will present an adaptation of the IGKROOl round- 
ing scheme to sample a set T of paths from a distribution that depends on Y. For this sake, 
starting at layer 0, we will go through all layers and for each path P (ending in node u) that is 
sampled so far, we will extend it to P u {{u, v)} with probability 

(1) T:=0 

(2) F0RALLee5+(r) DO 

(3) independently, with prob. y^g], add path {e} to T 
(4) FOR 7 = 1,...,/-! DO 

(5) FOR ALL u e Vj and all r-u paths PeTDO 
(6) FOR ALL ee 5+ (u) DO 

^Unfortunately, the instance has a number of nodes which is exponential in the number of terminals. Of 
course, the instance of IHKK"*" ObI provides a Oflog^ n) gap. To the best of our knowledge, there is no known 
instance with a w(log^ n) integrality gap. 
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(7) independently with prob. add P u {e} to T 
(8) return £(7). 

With ViP) we denote the set of vertices on path P. Furthermore, let E{T) := Uper^* be the 
set of all edges on any path of T. Also let V(_T) := Uper ^C^*)- Note that we did not remove 
partial paths (i.e. paths from r to some layer j < £), which will turn out to be convenient 
later. 



5 The analysis 

The analysis consists of two parts: 

(i) We show that for each edge e the probability to be included is Pr[e e E{T)] < y{e]. 

(ii) We prove that for each terminal s e X, the probability to be connected by a path satis- 
fies Pr[5 e 1/(7)] > 0{i). 

Part {i) provides that the expected cost for the sampled paths is at most OPTf, while part {ii) 
implies that after repeating the sampling procedure 0(^log |X|) times, each terminal will be 
connected to the root with high probability. Let us begin with part {i). 

Upper bounding the expected cost 

For each node veV, let Q[v) := {P \ P is r-v path} be the set of paths from the root to v. For 
an edge e = {u, v) e E we denote Q[e) as the set of r-v paths that have e as last edge. 

Lemma 3. LetP be an r-v path with veV . Then Pr[P e T] = yp. 

Proof. Let P = (ei, . . . , ej) be the path with e Vi-i x Vf. Then the probability that the algo- 
rithm samples path P is 

PnPeT] = y^e,y^^-^^^^^^-...-^^ = yp. 

y{ei] y{ei,e2] yPMej] 

a 

The next lemma will imply that each edge e is sampled with probability at most its frac- 
tional value yief 

Lemma 4. For any edge eeE, one has T.PEQ(e) yp - yie] ■ 

Proof. We prove the following claim by induction over 7 = 0,...,^-!: For any edge e eVj x 
Vj+i and any solution Y e Las t'iK) witht'>j one /zflsXp£Q(e) yp ^ y{e}l2l 

The claim is clear for j = 0, thus consider an edge e = {u, v) e Vj x Vjj^i between the y'th 
and the ( j -t- l)th level. Applying Thm.[2] fd) with / := {e} we write Y = y^e} ■ F'^' -t- (1 - yjej) • 
such that F"", F'^' e LASt'-iiK) and y[^J = 1 as well as yj^J = 0. For edges e' e 5~[u] ingoing 



We abbreviate yp := Yp^0. 
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to u, we apply the induction tiypothiesis and get EpeQ(e') ^ y\J]- Since y|gj = 1, we know 
that yjgjup = yp' (see TheoremEHhJ). It follows that 

yp = y{e]- E yp^ = yie] E E ypui^} - 3^f''} E yS^y^- 

PeQ(e) P£Q(e) e'£5-(M)PeQ(e')' — » — ' e£6-{u) 

(1) > , ' 

' 

□ 

Combining both Lemmas[3]and[4l we obtain Pr[e e E{T)] < y^e] and consequently £"[0 (£"(7))] < 
Lee£ CeYie] by linearity of expectation. 

Lower bounding the success probability 

In the following Lemma, we relate the "path variables" yp with the edge capacities yjg}. In 
fact, (a) will imply that each terminal is connected once in expectation and {b) bounds the 
probability that 5 is connected by a path containing a fixed subpath P': 

Lemma 5. Fix a terminal se X and an r -v path P' for some v eV. Then 

ZpeQ(5) yp = 1 
b) T.peQisy.p'cp yp ^ yp' ■ 

Proof. Consider the set of variables S := {fs,e \ ee E}.lf more than £ of those variables are set 
to 1, they cannot define a feasible unit flow In other words, we can apply Theorem Ell© in 
order to write Y = T.hqe ^h^^ as a convex combination of vectors such that for all H 
with Ah > one has: (i) Y^ e Las^ (iT); (ii) /^^ e {0, 1} for all e e £ and (iii) f^^ = lo eeH 
(again we abbreviate := F^^^^ and yf := Y,^^). 

But the variables {f^g \eeE} can only represent a unit r-s flow if // is an r-s path as well. 
Thus we have Xu = 0, whenever this is not the case. In other words, our convex combination 
is of the form Y = EpeQ(s) ApF^. 

Using Theorem [21 jbj we obtain yf < 1 for all e e P and the LP constraints imply yf > 
f^g = 1. We conclude that yf = 1 for all ee P. Then Theorem|2l{B provides yp = I. 

Conversely, consider any r-s path P' e Q{s) with P' ^ P and let v e V{P) be a vertex, 
where path P' enters P, i.e. P r\5~ {v) = {e} and P' r\5~ {v) = {e'} with e ^ e' . Since yf = 1 and 
Le"e5-(t;) y^n ^ 1 (by LP Constraint), we have y^ = and thus jp, = 0. We conclude Claim a), 
since 

yp= ^p yp = ^p 

PeQ(5) _ 
=0 if P7^P 

and IpeQt^) Ap = 1. 

To see b] note that y^, = 1, whenever P' e P. Thus 

yp,= ^ Apyf, > ^p= E yp- 

PeQ(s) PeQ(s):P'sP P<£Q(s):P'sP 

□ 
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In the following, we fix a terminal se X and define Z := \ T n Q{s) \ as the random variable 
that yields the number of sampled paths that end in s. Our goal is to show that Pr[Z > 1] > 
0(|). Recall that by Lemma[5l{a) and Lemma[3l we know already that 

CoroUarye. E[Z] = 1. 

Interestingly, the key insight of Garg, Konjevod and Ravi f GKROOl is to prove an upper 
bound on Z in order to Zou;er bound Pr[Z > 1]. 

Lemma?. E[Z\Z>1] <£ + l. 

Proof. Fix a path P = {ei,...,ee) e Q{s). It suffices to show E[Z | P e T] < ^ + 10 Let P,- = 
(ei, . . . , gj) c p be the r-subpath of P containing the first i edges. 

Consider any path P' e Q{s) and say it contains P,, but not P/+i. Since the probability 
distribution depends only on the "joint history" of P and P', we have Pr[P' e T \ P e T] = 
Pr[P' e r I P/ e P] . We use this to bound 

El\{P' eTn Q{s) I Pi e P';P,-+i ^ P'}\] < 

cond. prob 



Lemma|3] 

The claim follows since there are only £ + 1 such paths Pj. □ 

Garg-Konjevod-Ravi IGKR 001 make use of a sophisticated probabilistic result, the Janson 
Inequality (see e.g. IAS081). However, the desired bound can be achieved much easier: 

Lemmas. Pr[Z > 1] > 

Proof. By the law of total probability 

1 = E[Z] = Pr[Z = 0] • E[Z I Z = 0] + Pr[Z > 1] • E[Z | Z > 1] 

^ y ' ^ V ' 

=0 <^+l byLem.lz] 

thusPr[Z> 1] > □ 

Finally, we show the 0{£\og\X\) integrality gap. Interestingly, though the Lasserre solu- 
tion Y has n®*^' entries, we only query a polynomial number of entries yp. In other words, if 
we could evaluate each single entry yp in polynomial time, the algorithm would be polyno- 
mial as weU. 

Theorem 9. Let Y e Las^C^ he a given t = 2( round Lasserre solution. Then one can com- 
pute a feasihle solution H Q E with E[ciH)] < 0(^log|X|) -^ge^j/je}- The expected number of 
Lasserre queries and the expected overhead running time are both polynomial in n. 

^ The formal argument works as follows: Let Ai,..., Am be any events (in our application, Ap is the event "Pe 
T, conditioned onZ > 1") and Z := |{i | ^;}| the number of occurring events. We claim that i;[Z] < maXj-gj^j E[Z \ 
Ai] =: p. Proof: Using Jensen's inequality £[Z]2 < ElZ^] = Zi,j Pr[A; n/lj] = I,- Pr[A;] -I^ PilAj \ At] = I; Pr[Aj] ■ 
E[Z I Ai] < pY.i Pi]Ai] = pE[Z]. Rearranging yields the claim. 



Pr[P' e T I P; e P] 

P'£Q(s):P'3Pi 

^ Pr[P' e T and P, e T] 

ypi Lemma[5]^ 

P'<EQ(s]:P'3Pi 
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Proof. Repeat the sampling algorithm for 2/log|X| many times and let H be the union of 
the sampled paths. The probability that a fixed terminal s e X is not connected is bounded 
by (1 - 7^)^^'"^'^' ^ For any terminal s that remains unconnected, we buy the cheapest 
r-5path. The expected cost for this repair step is bounded by |X| • jy\ '^e^E^^eyie]- 

Consider a single sample T. For every node v eV (no matter whether terminal or not), 
the expected number of paths P e T connecting v is upper bounded by one. Thus the ex- 
pected number of sampled partial paths is bounded by n (if we denote the total number of 
nodes on all layers by n). After a partial path P is sampled, the algorithm queries at most n 
values of the form ypu{e}- Thus the total expected number of queries for a single sample is 
upper bounded hy n^. □ 

Together with Lemma[T]and the fact that Y can be computed in time this provides 
a polynomial time |X|'^-approximation algorithm for any constant e > 0. If we choose £ = 
log|X|, then we obtain a 0(log^ |X|) approximation in time /^odogizi) 

Remark 1. Observe that we explicitly used the Decomposition Theorem of IKMNllI in the 
proof of Lemma[5l Since the Decomposition Theorem does not hold for the Sherali-Adams 
or Lovasz-Schrijver hierarchy, it is not clear whether the same integrality gap bound is true 
for those weaker relaxations. 

However, there is a well-known reduction from a level-^ instance of Directed Steiner 
Tree to a tree instance F of Group Steiner Tree such that the produced tree F has size n*^'^' 
and contains all possible integral DST solutions as subtree. Of course, the corresponding 
Group Steiner Tree LP for this instance F has only a polylogarithmic integrality gap IGtCROOl 
and can also be interpreted as an LP for Directed Steiner Tree. 

It remains a challenging open problem, whether there is a convexrelaxation with apolylog(|X|) 
integrality gap that can be solved in polynomial time. Note that it would in fact suffice, 
to have a polynomial time oracle that takes a single path P Q E as input and outputs the 
Lasserre entry yp. 

Acknowledgements. The author is very grateful to David Pritchard for carefully reading a 
preliminary draft and to Michel X. Goemans, Neil Olver, Rico Zenklusen, Mohit Singh and 
David Steurer for helpful discussions and remarks. 
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A Properties of the Lasserre Hierarchy 

The main goal of this section is to present a complete proof of the convergence of the Lasserre 
hierarchy (Theorem|2jl|aJ); the feasibility of "conditioned" solutions (Theorem|2Hdl) and the 
Karlin-Mathieu-Nguyen Decomposition Theorem IKMNllI {Theorem|2]lIe)). In the follow- 
ing, we are going to reproduce the proof in IKMNlTl. However, we will use a different nota- 
tion and try to put the emphasis on an intuitive exposition instead of a space efficient one. 

Let3«([n]) :=0^ni[n]) = {I\lQ In]} be the family of all subsets of [n]. Recall that if = {x e 
U"-\Ax> b} is the set of relaxed solutions with AeU'"''" and beW". 

A. 1 The Inclusion-Exclusion Formula 

Suppose for the moment, that y e iR^d"!' indeed is consistent, i.e. there exists a random 
variable Z e {0, 1}" such that Vr[/\i£iiZi = 1)] = yi (thus E[Zi] = yt). Here, Z can be taken 
from any distribution — especially Z, and Z,- do not need to be independent. 
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Initially, the Lasserre relaxation contains only variables for "positive" events of the form 
f\i^i{Zi = 1). But using the inclusion-exclusion formula, one can also obtain probabilities for 
all other events. 

Recall that for any index set jQln] the inclusion- exclusion formula says that 



Pr 



V(z, = i) = X (-1) 



IHI + l 



Pr 



A iZi = 1) 



Negating this event yields 
Pr /\{Zi = 0) 



= 1-Pr 



\/{Zi = 1) = X (-1) Pr A = 1) 



(1) 



Next, let / £ [n] be another index set (not necessarily disjoint to J). Observe that Equation {TJ 
remains valid if all events are intersected with the same event f\i^i{_Zi = 1). In other words 
we arrive at the generalized inclusion exclusion formula (sometimes called Mobius inversion) 



Pr 



/\[Zi = 1) A /\{Zi = 0) = X (-1)1^1 Pr A = 1) 

iel i£j H^J i^IuH 



Thus for any /,/ £ [n] we define 



yi.-j-= y/uH = Pr /\{Zi = i),/\[Zi = o) 



(2) 



(3) 



(for example y0,-{i} = y0-y{i})- If /u/= [n], then we abbreviate y/,-/ =: yx as the probability 
for the atomic event x e {0, 1}" with 

f 1 iel 

Xi = \ 

Furthermore we denote supp{x) := {i e [n] \ Xi = 1} and supp(x) := {i e [n] | x/ = 0}. We saw so 
far, that the 2" many probabilities y/ uniquely define the 2" many probabilities y^ for atomic 
events. Conversely, one can obtain the values y/ and yi-j by summing over all atomic events 
that are consistent with the events, i.e. 



yi = Z yx 

xe{0,l}":/esupp(x) 

yi,-j = E y^- 

xe{0,l}":7csupp(x),/csupp(x) 



(4) 
(5) 



Let us make a couple of observations: 

• Equations lO and lHJ are both linear, thus they define an isomorphism between [yi)ic[n] 
and (yx)xe{0,i}"- This isomorphism is well defined even if y is not consistent (i.e. even 
if some y^ are negative or Lxe{o,i}« yx 7^ !)• 

• If / n/ 7^ 0, then by definition one has yi-j = since the sum in Eq. l[3) can be grouped 
into pairs that have the same absolute value - but different signs (for example y{i}__{i,2} = 

y{i}u0 - y{i}u{i} - y{i}u{2} + y{i}u{i,2i = o). 
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Remark 2. Lemma 2 in the survey of Laurent ILau03al states the following equivalences for 

• M„{y)>0 ^ Vxe{0,l}":yx>0 

• * ^ « Vx e {0, 1}" : j/^- (A^x- fo^) > 

If both conditions hold and additionally y0 = l, then Lx£{o,ir = y0 = 1 and the yx define 
a probability distribution over {0, 1}" with yx = for all x with Ax ^ b. In other words 

iy{i]'---'y{n])= E y^-^ 

xe{0,l}":Ax>fc 

is a convex combination of feasible points. However, we will show the convergence proof 
following IKMNllI , which has more synergy effects with the decomposition theorem. 



A.2 Partial assignments and the inversion formula 

Let 3' £ S^{[n\) be a family of index sets and let y e IR^ be a corresponding vector. For sub- 
sets X c s c [n], we define the conditioning on X and -S\X (sometimes also called partial 
assignment) as the vector z = {y}x-s\x e K-^®^ with zj := yiux-s\x = lHcs\z(-l)'^'y/uXuH 
and 3'eS:={Ie3'\\/jQS: luje 3'}. The definition of entry zj only makes sense, if 
7 u / e 5" for all / £ S, thus 5" e S is the maximum family of sets, for which this is satisfied. 
Note that the set .^T e S is not necessarily smaller than 3' . For example e S = £5*([n]) 

and5«f([n])eSc3»j_|5| ([«]). 

We define the normalized conditioning on X and -S\X as w := (say lu := if Z0 = 
to be well-defined). The intuition is that if Z e {0, 1}" again is a random variable with 
yi-j = Pr[Aie/(Z; = l),Ai^jiZi = 0)], then the (normalized) conditioned solution reflects 
conditional probabilities, i.e. 

wi,-j = Pr [ A iZi = 1), A (Zi = 0) I A (Zi = 1), A = 0) 

/£/ /£/ leX I£S\X 

The events {X,-S\X) obviously partition the probability space, if X runs over all subsets 
of S. This remains valid for conditioned Lasserre solutions. 

Lemma 10 (Inversion formula). LefyeK^""" andSQ [n]. Then y = Y.x^sWx,-s\x- 
Proof. We verify the equation for entry IqV: 

yiux,-s\x = yx 

X^S XcSx£{0,l}":/uZcsupp(x),S\ZEsupp(x) 

L yx-|{XcS:Xesupp(x),S\Xesupp(x)}| = y/ 

x£{0,l}":/csupp(x) " ' 

□ 
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A.3 Feasibility of conditioned solutions 

Next, we will see that conditioned solutions are still feasible on a smaller family of index sets: 

Lemma 11. Let X Q SqV, and y eU^''^"^^ withJ' Q&>m. Then 

M3-iy)>0 =^ M3-esi{y}x,-s\x)>0 

Proof. Abbreviate z := {yx,-s\x} ^ K^''"". Equivalently to Mg-iy) > 0, there must be vectors 
Vi with vi- vj = yiuj for /, / e 5". For each I e 3' e S,we define another vector 

wi:= X (-D'^'i^/uJfuH. 

H'^S\X 

We claim that those vectors provide a factorization of M^-Qsiz] (which proves the positive 
semi-definiteness of Mg-Qgiz)) 

wi-wj = X E (-l)'^'""^'i^/uXuH-t^/uXuL 

H^S\XLsS\X 

= 11 (-I)'^'"'"'y/U/UXUHUL 
HcS\XLcS\X 



(*)=Oif H7^0 

El f I 
(-1) yiuJuXuL = yiuJuX,-S\X= Ziuj- 

To see (*), observe that if there is any i e H Q S\X, then the sum contains the same term for 
L c (S\X)\{/} and L u {i} - just with different sign, so that the sum evaluates to 0. □ 

The following lemma is usually called commutativity of the shift operator (see e.g. fLau03al ) . 
Recall that for y e IR-^, we interpret = (^) * 3/ as the vector with wi := [Y.iE[n\ <^iyiu{i]- fiyi)^ 

Lemma 12. Let ye ; X<^ Se [n]; and a^ x> p bea linear constraint. Then 

ip) * {y}x.-s\x = {{p) * y}x.-s\x (e K^') 

with 3'' := {/£ [n] I /u/u {/} e 5" V/c S e [n]}. 

Proof. Let z/ = yiux,-s\x and uj = T-iein] (^iyiuU] - yif^- Evaluating the left hand side vector 
at entry / e 3'' gives 

((^) * {y}z,-s\x)7 = ^i^iuii]- Pzi = aiyiuxu{i]-s\x- I3yiux,-s\x 



The right hand side entry for / is 

UluX,-S\X = Y "Xu7u// 
HsS\X 

E (-1)'^' E aiyXuIu{i]uH- PyiuXuH 
HcS\X i&ln] 

Both expressions are identical and the claim follows. □ 



^In fact, we were sloppy concerning the dimension of w so far. Formally, one should define w e with 
3'' ~{I\Iu{i]e3-\/ieln]]. 
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Now we will see that normalized conditioned solutions are feasible and integral on vari- 
ables in S: 

Lemma 13. LetXQSQln],3\,^2'^d^i[n]) andyeU^'-^"^^ with 

M^^[y)>0; M5i((^;)*y)>0V^e[m]; 3/0 = 1. 

Define z:= {y}x,-s\x £ IR^""". Ifz0 > 0, then for w:= j- one has 



Z0 

' Ms-,esiw)>0; Ms:,esi0*w)>O\/£e[m]; z^0 = 1. 

• wi=liflQX 

• wi = OiflQSbutIn[S\X]^0. 

Proof. LemmafTT] directly implies that M^-^est^^) = j^Mg-^Qsi^) ^ 0- Applying the commuta- 
tivity rule, 

M^,esi[t]*w) = —M,T,esi[t)*{y}x,-S\x) 

■20 

LemmaHl] 1 ,^ , LemmaUT] 

- '-' M.^^esi{[h,]*y}x.-S\x) > 



20 

using that M^^ ([^^) * y)>0. Furthermore for / e S one has 



zi yxui-s\x 1 I'^x 

Wi= — = = < 

Z0 yx,-s\x /n(S\X)7^0 



□ 



A.4 Convergence 

With the last lemma at hand, the convergence of the Lasserre hierarchy follows quickly. 
Lemma 14. K 3 Las^'^^K) 2 ... 3 LAs''„"'hK) 3 conv{K n {0, 1}"). 

Proof By definition LAsf°kK) 3 LAS^^iiK). Let y e LASo(i«:). Then < (Mo((^^') * y))0,0 = 

Iie[«] Aftyii] - hi, thus ^(ym y{„}) > b and ^Ts LAS^'°^(ir). 

Finally let x e TCn {0, 1}" and define y e R^""!' with y/ := ri/e/.^!- Then one has M„(y) = 
yy^ > 0. Furthermore M„((^^') * y) = (A^x- fo^) • yy^ > 0, thus y e LAS^CiC). □ 

The following statement implies that LAS^™^(ir) = conv(i(rn {0, 1}"): 



Lemma 15. LAS„(i<r) e confUriiG/ J;^!)/c[«] I xe {0, 1}" : Ax > b}. 



Proo/ For all X £ [n], define := {y}x.-in]\x e K^""" and := 4 if z| > 0. Then 



Xc[n]:z?>0 



This is a convex combination since J^x^in] ^0 = y</> = I (again by Lemma [TOt. For a fixed X, 

X 



we abbreviate X; := wf:,. Then Lemma[T3]provides that w-^ e Las„[K) (thus xe K); xe{0,l}" 



and wf = lli<^iXi. □ 
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A. 5 Local consistency 

Let y £ K^'*^' and t' < t. Then y\^^,(v) e K^'''^' is the vector that emerges from y after dele- 
tion of all entries / with | / 1 > t' . Moreover, for any vector y e IR-^ with 5" e 3^ ( [ ) , we define 
the extension y' e [R^d"!' as the vector y where missing entries are filled with zeros. 

Lemma 16. Let ye LASfCiT) andSQ [n] with \S\ < t. Then 

y e conviw \ W|<5»2„_|s||+2 ^ LASr-|s| (K); W{i] e {0, 1} V/ e S} 

Proof. Again we can write a convex combination y' = Lxss^l^ • u>^ (with e K^'"') ac- 
cording to Lemma [TOl Recall that &>2t+2iW) © S 2 ^2(f-|S|)+2([«])> thus Lemma fT3] provides 
that u'|ga2(t_|si)+2([«]) ^ LASf-|s| (-SO and that w is integral on S. □ 



A.6 The decomposition tlieorem 



Imagine for a second that y e Las t {K) and y/ = for all | /| > f. Then we can just fill the matri- 
ces M[+i{y) and Mj ([^') * y) with zeros to obtain M„{y) and M„{[^^') * y) without destroying 
positive semi-definiteness or their consistency. Consequently, y would even be in the con- 
vex hull of feasible integral vectors, even though we only assumed y to be a ?-round solution. 
With a bit more care, this approach applies more generally to any subset of variables. 



Theorem 17 (Decomposition Theorem IKMNllI ). LetO< k 
that\In S\> yi = 0. Then 



t,yeLASt{K), X Q S Q V so 



y e conv{w \ W\2{t-k)+2 ^ LASf_fc(i<r); W{i} e {0, 1} V/ e S} 



Proof. Again extend y to y' e IR^'"'. Define 51 := {A Q [n] \ \ A\S\ < t+l-k} as the set of 
indices that have at most t+l-k indices outside of S. After sorting the rows and columns 
by increasing size of index sets, we can write 



/:|/|<r+l J:\J\>t+l 



M^^iy'] = 



M 







>0 



where M is a principal submatrix of Mt+iiy) > 0. Next, observe that there are entries y'j 
that may appear inside of M and outside. If they appear outside, say at entry (/i,/2), then 
/ = A u/2 and > f + 1, but \h\S\ < t+l- k (so that h e 51). Thus |/n S| > |/i n S| = 
l/i I - l/i \S| > k, thus y^ = by assumptior(§. 

Analogously for ^2 := {Aq [n] \ \A\S\ < t-k} one can write 



J.\J\<t 



J:\J\>t 



N 















>0 



^In other words, the matrix Mg-^ (y') is consistent in the sense that entries iJi.Jz) and (/a, 74) are identical 
whenever /i U/2 =73 U/4. 
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with N>0 being a principal submatrix of Mf {(j^') * y)>0. Again, consider an entry / that 
appears at least once outside of N thus we can write / = /i u/2 with I /i I > if and /i e ^ , hence 
1/1 \S| <t-k. Then for any / 2 h one has |/' n S| > |/i n S| = - |/i\S| > + 1) - (f- fc) > fc 
and y'j, = by assumption. The matrbc entry at position / is E/e[n] AayjuU} - f^Yj = 0, hence 
the matrix is defined consistently. 

Define := {y}x -s\x e IR^""" and ■- ^ {w^ ■= 0, if z^ = 0], then as in Lemma[l5l 

we can express y' as convex combination y' = Y.x^s ^0 ■ Since .fFieS = 5~i and^eS = Jz, 
we apply Lemma[T3]and obtain Mg-^ [w^] > 0, M^^ ^ibf) * - ^ ^'^^ £ e[m] and = 1. 
Observe that ^t+i-ki\n\) £ 3'i and ^t-ki[n\) e 32. Consequently w\2(t-k)+2 ^ ^^^t-kiK) as 
claimed. □ 

Now we argue, why the Decomposition Theorem implies Property [21©. 

Lemma (Property El EJ). Let K = {x eU^ \ Ax > b} and y e LASt(^r) and assume that for a 
subset of variables S £ [n] one has max{|7| : I S;x e K;Xi = \ y i e 1} < k < t. Then y e 
conv{{z e LASt_fc(ir) | Z{i} e {0, 1} V« e S}). 

Proof. Consider an index set / £ S with \I\ = k+\< t. Then Property|2llc) implies that y/ = 0. 
For all |/|, I/I < f + 1 with yj = 0, inspecting the determinant of the principal submatrix of 
Mt+iiy) induced by indices / and /, we see that 

O^detf ° H = -yL,. 

thus y/u/ = (i.e., all entries are monotone). We summarize: All entries / £ [n] with |/| < 
2(f + 1) and |/n S| > A; have y/ = 0. Then by the Decomposition Theorem we have 

yeconv{u^| w\2it-k)+2^l-^St-kiK); W{i] e {0,1} \fi e S}. 

a 
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